skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Zhang, Jeffrey"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available March 31, 2026
  2. Abstract Protein language models, like the popular ESM2, are widely used tools for extracting evolution-based protein representations and have achieved significant success on downstream biological tasks. Representations based on sequence and structure models, however, show significant performance differences depending on the downstream task. A major open problem is to obtain representations that best capture both the evolutionary and structural properties of proteins in general. Here we introduceImplicitStructureModel(ISM), a sequence-only input model with structurally-enriched representations that outperforms state-of-the-art sequence models on several well-studied benchmarks including mutation stability assessment and structure prediction. Our key innovations are a microenvironment-based autoencoder for generating structure tokens and a self-supervised training objective that distills these tokens into ESM2’s pre-trained model. We have madeISM’s structure-enriched weights easily available: integrating ISM into any application using ESM2 requires changing only a single line of code. Our code is available athttps://github.com/jozhang97/ISM. 
    more » « less
    Free, publicly-accessible full text available November 11, 2025
  3. Abstract Engineering stabilized proteins is a fundamental challenge in the development of industrial and pharmaceutical biotechnologies. We present Stability Oracle: a structure-based graph-transformer framework that achieves SOTA performance on accurately identifying thermodynamically stabilizing mutations. Our framework introduces several innovations to overcome well-known challenges in data scarcity and bias, generalization, and computation time, such as: Thermodynamic Permutations for data augmentation, structural amino acid embeddings to model a mutation with a single structure, a protein structure-specific attention-bias mechanism that makes transformers a viable alternative to graph neural networks. We provide training/test splits that mitigate data leakage and ensure proper model evaluation. Furthermore, to examine our data engineering contributions, we fine-tune ESM2 representations (Prostata-IFML) and achieve SOTA for sequence-based models. Notably, Stability Oracle outperforms Prostata-IFML even though it was pretrained on 2000X less proteins and has 548X less parameters. Our framework establishes a path for fine-tuning structure-based transformers to virtually any phenotype, a necessary task for accelerating the development of protein-based biotechnologies. 
    more » « less
    Free, publicly-accessible full text available December 1, 2025
  4. Abstract In eukaryotes, linear motor proteins govern intracellular transport and organization. In bacteria, where linear motors involved in spatial regulation are absent, the ParA/MinD family of ATPases organize an array of genetic- and protein-based cellular cargos. The positioning of these cargos has been independently investigated to varying degrees in several bacterial species. However, it remains unclear how multiple ParA/MinD ATPases can coordinate the positioning of diverse cargos in the same cell. Here, we find that over a third of sequenced bacterial genomes encode multiple ParA/MinD ATPases. We identify an organism (Halothiobacillus neapolitanus) with seven ParA/MinD ATPases, demonstrate that five of these are each dedicated to the spatial regulation of a single cellular cargo, and define potential specificity determinants for each system. Furthermore, we show how these positioning reactions can influence each other, stressing the importance of understanding how organelle trafficking, chromosome segregation, and cell division are coordinated in bacterial cells. Together, our data show how multiple ParA/MinD ATPases coexist and function to position a diverse set of fundamental cargos in the same bacterial cell. 
    more » « less